Serveur d'exploration sur la musique en Sarre

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Codebook Design for Speech Guided Car Infotainment Systems

Identifieur interne : 000623 ( Main/Exploration ); précédent : 000622; suivant : 000624

Codebook Design for Speech Guided Car Infotainment Systems

Auteurs : Martin Raab [Allemagne] ; Rainer Gruhn [Allemagne] ; Elmar Noeth [Allemagne]

Source :

RBID : ISTEX:B0900FEE6C7C3D6AD35E4498DC98E585F9B042DC

English descriptors

Abstract

Abstract: In car infotainment systems commands and other words in the user’s main language must be recognized with maximum accuracy, but it should be possible to use foreign names as they frequently occur in music titles or city names. Previous approaches did not address the constraint of conserving the main language performance when they extended their systems to cover multilingual input. In this paper we present an approach for speech recognition of multiple languages with constrained resources on embedded devices. Speech recognizers on such systems are typically to-date semi-continuous speech recognizers, which are based on vector quantization. We provide evidence that common vector quantization algorithms are not optimal for such systems when they have to cope with input from multiple languages. Our new method combines information from multiple languages and creates a new codebook that can be used for efficient vector quantization in multilingual scenarios. Experiments show significant improved speech recognition results.

Url:
DOI: 10.1007/978-3-540-69369-7_6


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI wicri:istexFullTextTei="biblStruct:series">
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Codebook Design for Speech Guided Car Infotainment Systems</title>
<author>
<name sortKey="Raab, Martin" sort="Raab, Martin" uniqKey="Raab M" first="Martin" last="Raab">Martin Raab</name>
</author>
<author>
<name sortKey="Gruhn, Rainer" sort="Gruhn, Rainer" uniqKey="Gruhn R" first="Rainer" last="Gruhn">Rainer Gruhn</name>
</author>
<author>
<name sortKey="Noeth, Elmar" sort="Noeth, Elmar" uniqKey="Noeth E" first="Elmar" last="Noeth">Elmar Noeth</name>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:B0900FEE6C7C3D6AD35E4498DC98E585F9B042DC</idno>
<date when="2008" year="2008">2008</date>
<idno type="doi">10.1007/978-3-540-69369-7_6</idno>
<idno type="url">https://api.istex.fr/document/B0900FEE6C7C3D6AD35E4498DC98E585F9B042DC/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">001240</idno>
<idno type="wicri:explorRef" wicri:stream="Istex" wicri:step="Corpus" wicri:corpus="ISTEX">001240</idno>
<idno type="wicri:Area/Istex/Curation">001151</idno>
<idno type="wicri:Area/Istex/Checkpoint">000447</idno>
<idno type="wicri:explorRef" wicri:stream="Istex" wicri:step="Checkpoint">000447</idno>
<idno type="wicri:doubleKey">0302-9743:2008:Raab M:codebook:design:for</idno>
<idno type="wicri:Area/Main/Merge">000623</idno>
<idno type="wicri:Area/Main/Curation">000623</idno>
<idno type="wicri:Area/Main/Exploration">000623</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title level="a" type="main" xml:lang="en">Codebook Design for Speech Guided Car Infotainment Systems</title>
<author>
<name sortKey="Raab, Martin" sort="Raab, Martin" uniqKey="Raab M" first="Martin" last="Raab">Martin Raab</name>
<affiliation wicri:level="3">
<country xml:lang="fr">Allemagne</country>
<wicri:regionArea>Harman Becker Automotive Systems, Speech Dialog Systems, Ulm</wicri:regionArea>
<placeName>
<region type="land" nuts="1">Bade-Wurtemberg</region>
<region type="district" nuts="2">District de Tübingen</region>
<settlement type="city">Ulm</settlement>
</placeName>
</affiliation>
<affiliation wicri:level="3">
<country xml:lang="fr">Allemagne</country>
<wicri:regionArea>Dept. of Pattern Recognition, University of Erlangen, Erlangen</wicri:regionArea>
<placeName>
<settlement type="city">Erlangen</settlement>
<region type="land" nuts="1">Bavière</region>
<region type="district" nuts="2">District de Moyenne-Franconie</region>
</placeName>
</affiliation>
<affiliation></affiliation>
</author>
<author>
<name sortKey="Gruhn, Rainer" sort="Gruhn, Rainer" uniqKey="Gruhn R" first="Rainer" last="Gruhn">Rainer Gruhn</name>
<affiliation wicri:level="3">
<country xml:lang="fr">Allemagne</country>
<wicri:regionArea>Harman Becker Automotive Systems, Speech Dialog Systems, Ulm</wicri:regionArea>
<placeName>
<region type="land" nuts="1">Bade-Wurtemberg</region>
<region type="district" nuts="2">District de Tübingen</region>
<settlement type="city">Ulm</settlement>
</placeName>
</affiliation>
<affiliation wicri:level="4">
<country xml:lang="fr">Allemagne</country>
<wicri:regionArea>Dept. of Information Technology, University of Ulm, Ulm</wicri:regionArea>
<placeName>
<region type="land" nuts="1">Bade-Wurtemberg</region>
<region type="district" nuts="2">District de Tübingen</region>
<settlement type="city">Ulm</settlement>
</placeName>
<orgName type="university">Université d'Ulm</orgName>
</affiliation>
</author>
<author>
<name sortKey="Noeth, Elmar" sort="Noeth, Elmar" uniqKey="Noeth E" first="Elmar" last="Noeth">Elmar Noeth</name>
<affiliation wicri:level="3">
<country xml:lang="fr">Allemagne</country>
<wicri:regionArea>Dept. of Pattern Recognition, University of Erlangen, Erlangen</wicri:regionArea>
<placeName>
<settlement type="city">Erlangen</settlement>
<region type="land" nuts="1">Bavière</region>
<region type="district" nuts="2">District de Moyenne-Franconie</region>
</placeName>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series>
<title level="s">Lecture Notes in Computer Science</title>
<imprint>
<date>2008</date>
</imprint>
<idno type="ISSN">0302-9743</idno>
<idno type="eISSN">1611-3349</idno>
<idno type="ISSN">0302-9743</idno>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt>
<idno type="ISSN">0302-9743</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="Teeft" xml:lang="en">
<term>Additional gaussians</term>
<term>Additional language</term>
<term>Additional languages</term>
<term>Algorithm</term>
<term>Baseline</term>
<term>City names</term>
<term>Codebook</term>
<term>Codebook design</term>
<term>Codebooks</term>
<term>Database</term>
<term>English codebook</term>
<term>Experimental setup</term>
<term>Foreign names</term>
<term>Future work</term>
<term>Gaussians</term>
<term>German codebook</term>
<term>Gruhn</term>
<term>Hiwire</term>
<term>Hiwire data</term>
<term>Hiwire database</term>
<term>Human input</term>
<term>Infotainment</term>
<term>Infotainment scenario</term>
<term>Infotainment systems</term>
<term>Initial codebooks</term>
<term>Main language</term>
<term>Main language codebook</term>
<term>Main language performance</term>
<term>Maximum accuracy</term>
<term>Multilingual</term>
<term>Multilingual input</term>
<term>Multilingual recognition</term>
<term>Multilingual speech recognition</term>
<term>Multiple languages</term>
<term>Music titles</term>
<term>Mwcs</term>
<term>Native english codebook</term>
<term>Native speech</term>
<term>Nearest neighbor connections</term>
<term>Nonnative speech</term>
<term>Other words</term>
<term>Quantization</term>
<term>Raab</term>
<term>Results show</term>
<term>Same time</term>
<term>Sound patterns</term>
<term>Speech recognition</term>
<term>Speech recognizers</term>
<term>Such collections</term>
<term>Such systems</term>
<term>Training samples</term>
<term>Vector quantization</term>
<term>Word accuracies</term>
</keywords>
</textClass>
<langUsage>
<language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">Abstract: In car infotainment systems commands and other words in the user’s main language must be recognized with maximum accuracy, but it should be possible to use foreign names as they frequently occur in music titles or city names. Previous approaches did not address the constraint of conserving the main language performance when they extended their systems to cover multilingual input. In this paper we present an approach for speech recognition of multiple languages with constrained resources on embedded devices. Speech recognizers on such systems are typically to-date semi-continuous speech recognizers, which are based on vector quantization. We provide evidence that common vector quantization algorithms are not optimal for such systems when they have to cope with input from multiple languages. Our new method combines information from multiple languages and creates a new codebook that can be used for efficient vector quantization in multilingual scenarios. Experiments show significant improved speech recognition results.</div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>Allemagne</li>
</country>
<region>
<li>Bade-Wurtemberg</li>
<li>Bavière</li>
<li>District de Moyenne-Franconie</li>
<li>District de Tübingen</li>
</region>
<settlement>
<li>Erlangen</li>
<li>Ulm</li>
</settlement>
<orgName>
<li>Université d'Ulm</li>
</orgName>
</list>
<tree>
<country name="Allemagne">
<region name="Bade-Wurtemberg">
<name sortKey="Raab, Martin" sort="Raab, Martin" uniqKey="Raab M" first="Martin" last="Raab">Martin Raab</name>
</region>
<name sortKey="Gruhn, Rainer" sort="Gruhn, Rainer" uniqKey="Gruhn R" first="Rainer" last="Gruhn">Rainer Gruhn</name>
<name sortKey="Gruhn, Rainer" sort="Gruhn, Rainer" uniqKey="Gruhn R" first="Rainer" last="Gruhn">Rainer Gruhn</name>
<name sortKey="Noeth, Elmar" sort="Noeth, Elmar" uniqKey="Noeth E" first="Elmar" last="Noeth">Elmar Noeth</name>
<name sortKey="Raab, Martin" sort="Raab, Martin" uniqKey="Raab M" first="Martin" last="Raab">Martin Raab</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Wicri/Sarre/explor/MusicSarreV3/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000623 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000623 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Wicri/Sarre
   |area=    MusicSarreV3
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     ISTEX:B0900FEE6C7C3D6AD35E4498DC98E585F9B042DC
   |texte=   Codebook Design for Speech Guided Car Infotainment Systems
}}

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Sun Jul 15 18:16:09 2018. Site generation: Tue Mar 5 19:21:25 2024